Structuring Terminology: between Lexicons and Domain Knowledge Representation

نویسندگان

  • Wolfgang Menzel
  • Cristina Vertan
  • Galia Angelova
چکیده

This article presents a revisited view on the project DB-MAT and its approach to model multilingual terminology using a single knowledge base. 1 German-Bulgarian terminology in DB-MAT The project DB-MAT (1992-1995) aimed at the design and implementation of a translators' workbench providing linguistic and domain explanations to human translators, within the paradigm of the knowledge-based Machine Aided Translation (MAT) [1]. Most generally, the innovative idea is to integrate in the MAT workbench a domain model (a knowledge base of conceptual graphs) and to generate on the fly explanations, when the translator highlights unknown terms in the source text to be translated. The project had to deal with German and Bulgarian languages, which opened the question how to link the corresponding entries of the bilingual lexicon to the Knowledge Base (KB) entities. Figure 1 presents an early model of the pointers between the lexicon and the KB items. In principle, keeping phrasal lexicons is an acceptable strategy to support multilingual terminology (in the 80ies, several projects and prototypes of the so-called "knowledgebased term banks" seemed to approach the issue in a similar way). However, especially in DB-MAT, it became clear that the picture shown at Figure 1 has two potential "defects". First, it contains repeating information in the lexicons (see for instance all Bulgarian noun phrases including the word court). Second, the Bulgarian noun phrases are to be declined during the surface verbalisation according to complicated grammar rules (since the articles in Bulgarian are augmented at the end of the noun or the preceding adjective); so it turned out that the generation grammar for Bulgarian would work more easily with non-phrasal lexicons (since declination rules are to be supported anyway). In this way, to avoid repeating information in the (phrasal) lexicons and to provide more uniform treatment and even some elegance in the process of multilingual 1 between Hamburg University (NATS) and the Linguistic Modelling Department, Central Laboratory for Parallel Processing. Bulgarian Academy of Sciences, funded by Volkswagen Foundation (Germany)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Automatic Acquisition of Meaning Elements for the Creation of Semantic Lexicons

This paper presents, in a unified way, two new trends in natural language processing, that is a new kind of lexicons that are cornerstones of a lot of current natural language applications which tackle the problem of meaning, and different corpus-based lexical knowledge acquisition studies that have emerged with the big amounts of electronic texts available on the nets. More precisely, this pap...

متن کامل

Efficient Data Selection for Bilingual Terminology Extraction from Comparable Corpora

Comparable corpora are the main alternative to the use of parallel corpora to extract bilingual lexicons. Although it is easier to build comparable corpora, specialized comparable corpora are often of modest size in comparison with corpora issued from the general domain. Consequently, the observations of word co-occurrences which are the basis of context-based methods are unreliable. We propose...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Semi-Automatic Acquisition of Domain-Specific Translation Lexicons

We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons. 1 I n t r o d u c t i o n Reliable translation lexicons are useful in many applications, such as cross-langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013